-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: cache loaded index metadata #1831
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging this deadlock in GCS has made me a little paranoid I think.
rust/lance/src/dataset.rs
Outdated
/// Cache index metadadta of a version of the dataset. | ||
pub(crate) index_metadata_cache: Arc<futures::lock::Mutex<Option<CachedIndexMetadata>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not include this as part of session
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Was thinking that this cache has the same lifetime with the Dataset
.
But if put into Session, we can actually maintain the cache for multiple opened dataset (i.e., per request). Seems a better deal. I will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put it in session, you can then just re-use the existing metadata cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The session
does have the same lifetime as Dataset
doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Session object can be passed in as external managed memory (outside of Dataset)
https://github.com/lancedb/lance/blob/main/rust/lance/src/dataset/builder.rs#L162
rust/lance/src/index.rs
Outdated
let manifest_file = self.manifest_file(self.version().version).await?; | ||
read_manifest_indexes(&self.object_store, &manifest_file, &self.manifest).await | ||
async fn load_indices<'a>(&'a self) -> Result<Vec<IndexMetadata>> { | ||
let mut indices = self.index_metadata_cache.lock().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async locks make me a little uneasy. Could we avoid it by doing something like...
Grab lock
If available and correct version then return
Release lock
Load indices from file
Grab lock
If not set, then set
Release lock
This could lead to multiple threads loading the indices but that should be rare and harmless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg.
043933f
to
f8dafe6
Compare
self.misses.fetch_add(1, Ordering::Relaxed); | ||
} | ||
} | ||
|
||
#[derive(Clone)] | ||
pub struct IndexCache { | ||
// TODO: Can we merge these two caches into one for uniform memory management? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downcasting/upcasting is weird. You can try. I got spun around and gave up on it at the time simply because the PR had lots of other stuff.
Closes #1657